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Abstract 

In a reachability-time game, players Min and Max choose moves so that the time to reach a final state 
in a timed automaton is minimised or maximised, respectively. Asarin and Maler showed decidability of 
reachability-time games on strongly non-Zeno timed automata using a value iteration algorithm. This paper 
complements their work by providing a strategy improvement algorithm for the problem. It also generalizes 
their decidability result because the proposed strategy improvement algorithm solves reachability-time games 
on all timed automata. The exact computational complexity of solving reachability-time games is also estab- 
■ lished: the problem is EXPTIME-complete for timed automata with at least two clocks. 

u; 

o ; 1 Introduction 



Timed automata |3] are a fundamental formalism for modelling and analysis of real-time systems. They have 
J> ' rich theory, solid modelling and verification tool support ll23l[T7l[T9l . and they have been successfully applied to 
numerous industrial case studies. Timed automata are finite automata augmented by a finite number of continuous 
! real variables which are called clocks because their values increase with time at unit rate. Every clock can be 
| reset to an integer constant when a transition of the automaton is performed, and clock values can be compared to 
t^- ■ integers to constrain availability of transitions. Adding clocks to finite automata increases their expressive power 
O , and the fundamental reachability problem is PSPACE-complete for timed automata Q. The natural optimization 
^ problems of minimizing and maximizing reachability-time in timed automata are also in PS PACE [14]. 

The reachability (or optimal reachability-time) problems in timed automata are fundamental to the verification 
of (quantitative timing) properties of systems modeled by timed automata [3]. On the other hand, the problem of 
control-program synthesis for real-time systems can be cast as a two-player reachability (or optimal reachability- 
' time) games, where the two players, say Min and Max, correspond to the "controller" and the "environment", 
respectively, and control-program synthesis corresponds to computing winning (or optimal) strategies for Min. In 
other words, for control-program synthesis we need to generalize optimization problems to competitive optimiza- 
tion problems. Reachability games J51 and reachability-time games El on timed automata are decidable. The 
former problem is EXPTIME-complete, but the elegant result of Asarin and Maler HI for reachability-time games 
is limited to the class of strongly non-Zeno timed automata and no upper complexity bounds are given. A recent 
result of Henzinger and Prabhu [ 16 ] is that values of reachability-time games can be approximated for all timed 
automata, but computatability of the exact values was left open. 

A generalization of timed automata to priced (or weighted) timed automata [7 ] allows a rich variety of applica- 
tions, e.g., to scheduling (6l[T]|22l|24]|. While the fundamental minimum reachability-price problem is PSPACE- 
complete [6, 8], the two-player reachability-price games are undecidable on priced timed automata with at least 
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three clocks ifTOl . The reachability -price games are, however, decidable for priced timed automata with one 
clock 021, and on the class of strongly price-non-Zeno priced timed automata El fTTTl . 

Our contribution. We show that the exact values of reachability-time games on arbitrary timed automata are 
uniformly computable; here uniformity means that the output of our algorithm allows us, for every starting state, 
to compute in constant time the value of the game starting from this state. In particular, unlike the paper of Asarin 
and Maler |4), we do not require timed automata to be strongly non-Zeno. We also establish the exact complexity 
of reachability-time games: they are EXPTIME-complete and two clocks are sufficient for EXPTIME-hardness. 
For the latter result we reduce from a recently discovered EXPTIME-complete problem of countdown games |[T8Tl . 

We believe that an important contribution of this paper are the novel proof techniques used. We characterize the 
values of the game by optimality equations and then we use strategy improvement to solve them. This allows us 
to obtain an elementary and constructive proof of the fundamental determinacy result for reachability-time games, 
which at the same time yields an efficient algorithm matching the EXPTIME lower bound for the problem. Those 
techniques were known for finite state systems IT2T1 l25l but we are not aware of any earlier algorithmic results 
based on optimality equations and strategy improvement for real-time systems such as timed automata. 

Related and future work. A recent, concurrent, and independent work |[T3l establishes decidability of slightly 
different and more challenging reachability-time games "with the element of surprise" |[T5l[T6ll . In our model of 
timed games players take turns to take unilateral decisions about the duration and type of subsequent game moves. 
Games with surprise are more general in two ways: in every round of the game players have a "time race" to be 
the first to perform a move; moreover, players are forbidden to use strategies which "stop the time", because such 
strategies are arguably physically unrealistic and result in Zeno runs. 

We conjecture that our principal technique of optimality equations and strategy improvement can be generalized 
to give an EXPTIME algorithm for reachability-time games with surprise, and we are currently working on it. We 
also believe that this technique is applicable to many other (competitive) optimization problems on (priced) timed 
automata and even on restricted classes of hybrid automata; we are currently working on optimality equations and 
strategy improvement for, e.g., average-time games on timed automata and on o-minimal hybrid systems O. 

2 Reachability-time games 

We assume that, wherever appropriate, sets N of non-negative integers and R of reals contain a maximum element 
00, and we write N>o for the set of positive integers and M>o for the set of non-negative reals. For n G N, we 
write [njN for the set {0, 1, ... , n}, and [ji]r for the set {r € R : < r < n} of non-negative reals bounded 
by n. For r G K>o, we write [rj for its integer part, and we write \r j for its fractional part. For sets X and Y, we 
write [X — > Y] for the set of functions F : X — > Y, and [X —r Y] for the set of partial functions F : X —r Y. 

Timed automata. Fix a constant k G N for the rest of this paper. Let C be a finite set of clocks. A (^-bounded) 
clock valuation is a function v : C — » we write V for the set [C —> of clock valuations. If v G V 

and t G M>o then we write v + t for the clock valuation defined by (y + t)(c) = v(c) + t, for all c G C. For 
a set C C C of clocks and a clock valuation v : C — » M>o, we define Reset (u,C')(c) = if c G C, and 
Reset (i/, C'){c) = v(c) if c C. 

The set of clock constraints over the set of clocks C is the set of conjunctions of simple clock constraints, which 
are constraints of the form c 00 i or c — d tx i, where c, d G C , i G Ikjjq, and 1x1 G {<,>,=,<,> }. Note 
that there are finitely many simple clock constraints and hence the set of non-equivalent clock constraints is finite. 
For every clock valuation v G V, let CC(s) be the set of simple clock constraints which hold in u G V. A clock 
region is a maximal set P C V, such that for all v, v' G P, we have CC(z^) = CC(z/). In other words, clock 
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regions are equivalence classes of the equivalence relation relating clock valuations which are indistinguishable 
by clock constraints. Observe that v and v' are in the same clock region iff all clocks have the same integer parts 
in v and v' , and if the partial orders of the clocks determined by their fractional parts in v and v' are the same. For 
all v G V, we write [v] for the clock region of v. 

A clock zone is a convex set of clock valuations which is a union of a set of clock regions. Note that a set of 
clock valuations is a zone iff it is definable by a clock constraint. For W C V, we write W for the closure of the 
set W, i.e., the smallest closed set in V which contains W. Observe that for every clock zone W, the set W is also 
a clock zone. 

Let L be a finite set of locations. A configuration is a pair (£, v), where £ G L is a location and v G V is a clock 
valuation; we write Q for the set of configurations. If s = {£, v) G Q and c G C, then we write s(c) for v(c). 
A region is a pair (£, P), where £ G L is a location and P is a clock region. If s = (£, v) is a configuration then 
we write [s] for the region (£, [u]). We write P for the set of regions. A set Z C S 1 is a zorce if for every £ £ L, 
there is a clock zone Wjp, such that Z = {(£, v) : £ <E L and v G W^}. For a region P = (£, P) G P, we write R 
for the zone {(£, v) : v G P}. 

A rimed automaton T = (L, C, S, A, E, 5, p, F) consists of a finite set of locations L, a finite set of clocks C, 
a set of states SCQ,a finite set of actions A, an action enabledness function E : A — > 2 5 , a transition function 
5 : L x ^4 — > L, a c/oc& reset function p : A — > 2 C , and a set of final states F C S 1 . We futher require that S 1 , P, 
and P(a) for all a £ A, are zones. 

For a configuration s = (•£, ^) G Q and t G M>o, we define s + t to be the configuration s' = {£, v + i) 
if f + £ G V, and we then write s — ^ s'. We write s — >t s' if s — ^< s' and for all i' G [0, t], we have 
(£, s + t') G S 1 . For an action a G A, we define Succ(s, a) to be the configuration s' = (£', u'), where = 5(£, a) 
and i/' = Reset(z^, p(a)), and we then write s s'. We write s —> s' if s s'; s, s' G 5; and s G P(a). For 
technical convenience and without loss of generality we will assume throughout that timed automata satisfy the 
requirement that for every s G S, there exists a G A, such that s —> s'. 

For s,s' G S, we say that s' is in the future of s, or equivalently, that s is in the past of s', if there is t G M>o, 
such that s — > t s'; we then write s — s'. For P, P' G P, we say that R' is in the future of P, or that P is in the 
past of P', if there is s G P and there is s' G P', such that s' is in the future of s\ we then write P — >* P'. We say 
that R' is the time successor of P if P P', P / P', and for every P" G P, we have that P R" P' 
implies P" = P or R" = R'\ we then write P P' or R' <- +i P. Similarly, for P, P' G P, we write R ^ R' 
if there is s G P, and there is s' G P', such that s —> s'. 

We say that a region P G P is thin if for every s G P and every e > 0, we have that [s] / [s + s] ; other 
regions are called thick; we write PThin and PThick for the sets of thin and thick regions, respectively. Note that if 
P G PThick then for every s G P, there is an e > 0, such that [s] = [s + e\. Observe also, that the time successor 
of a thin region is thick and vice versa. 

A timed action is a pair r = (a, t) G A x IR>o- For s G Q, we define Succ(s, r) = Succ(s, (a, i)) to be the 
configuration s' = Succ(s + t, a), i.e., such that s — ^ s" s', and we then write s -% s'. We write s —*t s ' if 
s — >t s" —> s'. If r = (a, t) then we write s s' instead of s s', and s ^> s' instead of s —>t s '- 

A finite run of a timed automaton is a sequence (sq, t±, s\, T2, ■ ■ ■ , r n , s n ) G Sx ((A x M>o) x S)*, such that for 
alH, 1 < i < n, we have Sj_i ^ Sj. For a finite run r = (s U5 ^l, Si, ^2, ■ ■ ■ , T n , s n ), we define Length(r) = n, 
and we define Last(r) = s n to be the state in which the run ends. We write Runsgn for the set of finite runs. 
An infinite run of a timed automaton is a sequence r = {sq,t\,s\,T2, ■ ■ ■}, such that for all i > 1, we have 
Sj_i ^ For an infinite run r, we define Length(r) = oo. For a run r = (so,ti, Si,T2, . . .), we define 
Stop(r) = inf{« : Si G P} and Time(r) = J]^ gth(r) ^; and we define RT(r) = Y^=i (r) U if Stop(r) < oo, 
and RT(r) = oo if Stop(P) = oo, where for all i > 1, we have t,- l = (a,, ij). 
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Strategies. A reachability-time game T is a triple (T, LMin> ^Max)> where T = (L,C, S, A, E,S, p, F) is a 
timed automaton and (-Lmhu ^Max) is a partition of L. We define QMin = v) G Q : £ G ^Min}> QMax = 

Q \ Q~Mm, ^Min = S D Qyim, Suax = S \ ^Min, = {[s] ■ S G QlAm}, and 7^-Max = Tl \ T^Min- 

A strategy for Min is a function p : Runs nn — > yl x R>o, such that if Last(r) = s £ #Min and p{r) = r then 
s — » s', where s' = Succ(s, r). Similarly, a strategy for Max is a function \ '■ Runs nn — > A x R>o, such that 
if Last(r) = s G SMax and x( r ) = 7" then s — » s', where s' = Succ(s,r). We write SMin and XlMax for the 
sets of strategies for Min and Max, respectively. If players Min and Max use strategies p and x, respectively, then 
the (p, x)-run from a state s is the unique run Run(s, p,, x) = (so, T\, si,t%, . . .}, such that so = s, and for every 
i > 1, if Si G S M in, or Sj G 5 Max , then ^(Run^s, p, x)) = n+i, or x(Run;(s, /i, %)) = r i+1 , respectively, where 
Runj(s,/i,x) = (so,ri,si, . . . ,Si-i,Ti,Si). 

We say that a strategy for Min is positional if for all finite runs r,r f G Runs nn , we have that Last(r) = 
Last(r') implies p,(r) = p(r'). A positional strategy for Min can be then represented as a function p : Sy& n —> 
A x M>o, which uniquely determines the strategy p°° G SMin as follows: p°°(r) = /x(Last(r)), for all finite runs 
r G Runs nn . Positional strategies for Max are defined and represented in the analogous way. We write IlMm and 
IlMax for the sets of positional strategies for Min and for Max, respectively. 

Value of reachability-time game and optimality equations Opt(r). For every s G S, we define its upper 
value Val*(s) and its lower value Val*(s) by Val*(s) = inf^gSMm su Pxes Max RT(Run(s, p, x))> and Val*(s) = 
su PveSMax m ^eE Mift RT(Run(s,/i,x)). The inequality Val*(s) < Val*(s) always holds. A reachability-time 
game is determined if for every s G S, its lower and upper values are equal to each other; then we say that 
the value Val(s) exists and Val(s) = Val*(s) = Val*(s). For strategies p G SMin and x G Siviax. we define 
VaP(s) = sup x6SM . n RT(Run(s,/x,x)), and Val x (s) = inf^E^ RT(Run(s, p, x))- For an e > 0, we say 
that a strategy p G SMin or x G Sj^ax is e-optimal if for every s G S, we have Val M (s) < Val(s) + e or 
Val x (s) > Val(s) — e, respectively. Note that if a game is determined then for every e > 0, both players have 
e-optimal strategies. 

We say that a reachability-time game is positionally determined if for every s G S, we have Val(s) = 
inf^griMin sup xgSMax RT(Run(s, p, x)) and Val(s) = sup xgriMax inf^ e s Min RT(Run(s, p, x)). Note that if the 
reachability-time game is positionally determined then for every e > 0, both players have positional e-optimal 
strategies. Our results (Lemma|2j Theorem[6l and Theorem IT8T> yield a constructive proof of the following funda- 
mental result for reachability-time games. 

Theorem 1 (Positional determinacy). Reachability-time games are positionally determined. 

Let T be a reachability-time game, and let T : S -> R and D : S -> N. We write (T, D) (= Opt MinMax (T), 
and we say that (T, D) is a solution of optimality equations Opt MinMax (r), if for all s G S, we have: 

• if D(s) = oo then T(s) = oo; and if s G F then (T(s),D(s)) = (0, 0); 

• if s G S M in\F then T(s) = m^ t {t+T(s') : s \ a'}, and D(s) = min{l+d' : T(s) = m^ t {t+T(s') : 
s A t s' and D(a') = d'}};and 

• if s G SWx \ then T(s) = sup ajt {t + T(s') : s \ s'}, and D(s) = max{l + tf : T(s) = 
sup a , t {i + T(s') : sA, s ' and D(s') = d'}}. 

Lemma 2 (e-Optimal strategies from optimality equations). If (T, D) \= Opt MinMax (r), then for all s G S, we 
have Val(s) = T(s) and for every e > 0, both players have positional e-optimal strategies. 
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Simple functions and simple timed actions. Let ICQ. A function F : X — > R is simple if either: there is 
ee2, such that for every s G X, we have F(s) = e; or there are e € Z and c G C, such that for every s € X, we 
have F(s) = e — s(c). 

Let X C Q be convex and let F : X — > R be a continuous function. We write F for the unique continuous 
function F' : X — > R, such that for all s G X, we have F'{s) = F(s). Observe that if F is simple, then F is 
simple. For functions F, F' : X — > R we define functions max(F, F') , min(F, F') : X — > R by max(F, F') (s) = 
max{ F(s), F'(s) } and min(F, F')(s) = min{ F(s),F'(s) }, for every s£l. 

Lemma 3. Le? F, F' : F — > R fee simple functions defined on a region R G F. 77iera either min(F, F') = F ara<i 
max(F, F') = F', or min(F, F') = F' and max(F, F 1 ) = F. In particular, both min(F, F') and max(F, F') 
are simple functions. 

Define the finite set of simple timed actions A = A x \k\^ x C. For s G Q and a = (a, b, c) G A we 
define t(s,a) = b — s(c) if s(c) < b, and i(s,a) = if s(c) > 6; and we define Succ(s,a) to be the state 

s' = Succ(s, r(a)), where r(a) = (a,t(s,a)); we then write s — ^ s'. We also write s A s' if s T ^> s'. 
Note that if a G *4 and s s' then [s'] G FThin- Observe that for every thin region R' G FThin> there is a 
number b G [[/c]n and a clock c G (7, such that for every R G F in the past of F', we have that s G R implies 
(s + (6 — s(c)) G R'\ we then write F —>b,c R' ■ For « = (o, 6, c) G .4 and F, F' G F, we write R ^ R 1 
or F A 6)C R', if F -> 6>c F" ^ R', for some F" G F T hin- For a G A and F, F' G F, if F ^> F' and 
F : R' ^R then we define the functions F® : F -» R and F° : F -> R by F®(s) = t(s, a) + F(Succ(s, a)) 
and F^(s) = 1 + F(Succ(s, a)), for all s G F. 

Proposition 4. Le? aGi araci R, R' £ 1Z. If R R' and F : R' ^Ris simple, then F® w simple. 

For a G 4 and F, F', F" G F, if F F" A F', s G F, and F : F' -> R, then we define the partial function 
F® a : R> —r R by F® a (i) = t + F(Succ(s, (a, t))), for all t G R> , such that (s + 1) G F"; note that the domain 
{t G R> : (s + t) G F"} of F® a is an interval. 

Proposition 5. Le? a £ A and R, R', R" G F. TfF ->* F" A- F', s G F, and F : F' -> R is simp/e, then 
F® a : / — ► R, w/iere / = {t G R>o : (s + t) G F"}, w continuous and nondecreasing. 

3 Timed region graph 

Timed region graph T. Let T = (T, LMin, Fvlax) be a reachability-time game. We define the timed region 
graph T to be the finite edge-labelled graph (F, M), where the set F of regions of timed automaton T is the set 
of vertices, and the labelled edge relation A4 C F x „4 x F is defined in the following way. For a = (a, b,c) G A 
and F, R' G F we have (F, a, F') G M, sometimes denoted by F ~^ R', if and only if one of the following 
conditions holds: 

• there is an R" G F, such that F -> 6)C F" F'; or 

• F G F M in, and there are R", R'" G F, such that F 

• F G FMax, and there are F", F w G F, such that F 

Observe that in all the cases above we have that R" G Fxhin and R'" G Fxhick- The motivation for the second 
case is the following. Let F R'" — > F', where F G FMin and R'" G Fxhick- One of the main results that we 
will implicitly establish is that in a state s G F, among all t G R>o, such that s + t G F w , the smaller the t, the 
"better" the timed action (a, t) is for player Min. Note, however, that the set {t G R>o : s + t G F'"} is an open 
interval because R'" G Fxhick. and hence it does not have the smallest element. Therefore, for every s G F, we 



b,c R" R'" R'\ or 

*b,c -K ^+1 — > -K . 
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model the "best" time to wait, when starting from a, before performing an a-labelled transition from region R'" 
to region R' , by taking the innmum of the set {t G R>o : a + t G R'"}. Observe that this infimum is equal to 
the tRu G R>o, such that s + t R „ G R", where R" -> +i R'", and that tR» = b- s(c), where R ^ b)C R" . In the 
timed region graph T, we summarize this model of the "best" timed action from region R to region R' via region 
R'" , by having a move (R, a, R') G M., where a = (a, b, c). The motivation for the first and the third cases of the 
definition of M. is similar. 

Regional functions and optimality equations Opt MinMax (T). Recall from Section|2]that a solution of optimal- 
ity equations Opt MinMax (r) for a reachability-time game T is a pair of functions (T, D), such that T : S — > R and 
D : S — > N. Our goal is to define analogous optimality equations Opt MinMax (T) for the timed region graph T. 

If R -S* i?', where 72, 7?' G 72 and a £ A then s £ R does not in general imply that Succ(a, a) G 7?'; it is 
however the case that s £ R implies Succ(a, a) € R'. In order to correctly capture the constraints for successor 
states which fall out of the "target" region R' of a move of the form R -2+ R', we consider, as solutions of optimality 
equations Opt MinMax (T), regional functions of types T : 72 — > [5 1 — r R] and 7J : 72 — > [S —r N], where for every 
R G 7£, the domain of partial functions T(i?) and D{R) is i?. Sometimes, when defining a regional function 
F : TZ — > [5 —7 R], it will only be natural to define F(R) for all s G -R, instead of all s G R. This is not 
a problem, however, because as discussed in Section |2] defining F(R) on the region R uniquely determines the 
continuous extension of F(R) to R. For a function F : 1Z — > [5 -? 1], we define the function F : 5 — > R by 
F( S ) = J F([ S ])( S ). 

Let T : W [S 1 R] and let D : 72, [5 ->• N]. We write (T, D) \= Opt MinMax (r) if for all seS.we have 
the following: 

• if s G Fthen (f (a), 5(a)) = (0,0); 

• if a G 5 M inthen (f (a), 5(a)) = min lex meA4 { (T(i2')®(s), («)) : = ([«],«, A')}; 

• if a G 5 M axthen (T(a),5(a)) = max lcx meA4 { (T(R')®(s), D(R')® (s)) : m = ([a], a, R')}. 

Solutions of Opt MinMax (T) from solutions of Opt MinMax (r). In this subsection we show that the function 
(T, D) i — ► (T, D) translates solutions of reachability-time optimality equations Opt MinMax (T) for the timed re- 
gion graph T to solutions of optimality equations Opt MinMax (r) for the reachability-time game Y. In other words, 
we establish that the function V i— > V is a reduction from the problem of computing values in reachability-time 
games to the problem of solving optimality equations for timed region graphs. Then in Section [4] we give an 
algorithm to solve optimality equations for Opt MinMax (r). 

We say that a function F : 72 — > [S —r R] is regionally simple or regionally constant, respectively, if for every 
region R G 72, the function F(R) : R — > R is simple or constant, respectively. 

Theorem 6 (Correctness of reduction^ to timed region graphs). If (T, D) \= Opt MinMax (r), T is regionally simple, 
and D is regionally constant, then (T, D) (= Opt MinMax (r). 

Proof. We need to show that for every a G Sjvim \ F, we have: (a) T(s) = mi a j{t + T(s') : s — >t s'}', and 
(b) 5(a) = min d / 6N {l + d! : f(a) = M ajt {t + f(a') : a a' and 5(a') = d'}}. The proof of the 
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corresponding equalities for states s G Sm&x \ F is similar and omitted. We prove the equality (a) here. 

f(s) = min {T(RX(s) : m = ([s], a, R')} 

= min { mm R/ {T(&)f >a (b - s(c)) : [a] -> 6 , c R" - 

R min ; {r(i?')® (6-Kc)) : [s] ^ b , c R" R'" ^ R'}} 
= mm Rr {M{T(R')f >a (t) : [s + i] = i?"} : [ s ] ^ R» \ R>} 
= min ; { inf{t + T(Succ(s, (a, t))) : [s + t} = R"} : [s] ->* R" A R'} 
= M{t + f (s') : s ^ s'} 

The first equality holds by the assumption that T \= Opt MinMax (r). The second equality holds by the definition 
of the move relation J\A of the timed graph T, and because if a = (a, b, c) then 

T{RX(s) = b-s(c)+T(R')(Succ(s,(a,b-s(c))) = T(X)f >a (b - s(c)). 

For the third equality we invoke regional simplicity of T which by Proposition[5]implies that the function T(R')f a 

is continuous and nondecreasing. If either [s] —>b,c R" — > R 1 , or [s] —>b,c R'" R" R', then we have that 
mf{t : [s + t] = R"} = b — s(c), and hence 

M{T(R')f, a (t) : [s + t]=R"} = T(R')® a (b — s(c)), 

because T(R')f a is continuous and nondecreasing. The fourth equality holds because [s + 1] = R" and R" A R' 
imply that [Succ(s, (a, t))\ = R', and hence T(R')(Succ(s, (a, t))) = T(Succ(s, (a, t))). □ 



4 Solving optimality equations by strategy improvement 

Positional strategies. A positional strategy for player Max in a timed region graph F is a function x '■ SWx — > 
M., such that for every s G Syiax, we have \{ s ) = ([ s ]> a )-^)> f° r some a £ A and R G TZ. A strategy 
X '■ Smox ^ Mis regionally constant if for all s, s' G Smsx, we have that [s] = [s'] implies x(s) = x( s ')> we can 
then write x([ s D f° r x( s )- Positional strategies for player Min are defined analogously. We write Amux and AMin 
for the sets of positional strategies for players Max and Min, respectively. 

If X G AMax is regionally constant then we define the strategy subgraph F\x to be the subgraph (1Z,M. X ) 
where M. x C J\A consists of: all moves (R, a, R') G M., such that R G T^Min; and of all moves m = (R, a, R'), 
such that R G 7^-Max and x(R) = m - The strategy subgraph F\/j, for a regionally constant positional strategy 
\i G AMin f° r player Min is defined analogously. We say that R G 1Z is choiceless in a timed region graph F if R 
has a unique successor in F. We say that F is 0-player if all R G 7£ are choiceless in T; we say that F is 1 -player 
if either all R G T^Min or all i? G T^Max are choiceless in T; every timed region graph F is 2-player. Note that if 
X and \l are positional strategies in F for players Max and Min, respectively, then F\x and F\fi are 1-player and 
(f fx) I'M is 0-player. 

For functions T : ft ->• [5 -> R] and D : 72. -► [5 -> M], and s G 5 Max , we define sets M*(s, (T, D)) and 
M*(s, (T, D)), respectively, of moves enabled in s which are (lexicographically) (T, D)-optimal for player Max 
and Min, respectively: 

M*(s, (T, D)) = avgmax lex {(T(RX(s), D(R')®(s)) : m = ([s],a,R')}, and 
M*(s,{T,D)) = urgmrn 1 ™ {(T(R')®(s),D(R')®(s)) : m = ([s],a, R')} . 
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Let Choose : 2 M —> M. be a function such that for every non-empty set of moves M C M, we have 
Choose(M) G M. For regional functions T : K -> [5 — ?■ R] and D : 1Z ^ [S —r N], the canonical 
(T, D)-optimal strategies X(t,d) and M(t,£>) f° r player Max and Min, respectively, are defined by: X(t,d)( s ) = 
Choose(M*(s, (T, D))), for every s G SMax; and H(t,d)( s ) = Choose(M*(s, (T, D))), for every s G ^Mm- 

Optimality equations Opt(f ), Opt Max (f), Opt Min (f ), Opt>(f ) and Opt<(f ). Let T : K -> [S -> R] and 
D : N]. We write (T, D) |= Opt Max (f) or (T,£>) |= Opt Min (f), respectively, if for all s G F, we 

have (T(s), D(s)) = (0, 0), and for all s G 5 \ F, we have, respectively: 

(T(s),D(s)) = r^{(T(R')®(s),D(R>)®(s)) : m = ([s],a, R')} , or 

(T(s),D(s)) = min 1 ™ {(T(R')®(s),D(R')*(s)) : m= ([*],<*,#)}. 

If T is 0-player then Opt Max (r) and Opt Min (r) are equivalent to each other and denoted by Opt(r). 

We write (T, D) |= Opt>(f ) or (T, Z?) f= Opt<(f ), resp., if for all s G F, we have (f (s), 5(a)) > lcx (0, 0) 
or (T(s), D(s)) < lcx (0, 0), respectively; and for all s G S \ F, we have, respectively: 

(T(s),D(s)) > lcx W^{(T(iJ')©( a ),2?(i2')=( S )) : m = ([s],a,R')}, or 

(T( S ),5( S )) <"« min; c ;{(T( J R')®( S ), J D( J R , )M : m= ([«],<*,#)}• 

Proposition 7 (Relaxations of optimality equations). If (T,D) \= Opt Max (T) then (T,D) \= Opt>(r), and if 
(T,D) \= Opt Min (f) then (T, D) \= Opt<(f). 

Lemma 8 (Solution of Opt(r) is regionally simple). Let V be a 0-player timed region graph. If(T, D) \= Opt(r) 
then T is regionally simple and D is regionally constant. 

Solving 1-player maximum reachability-time optimality equations Opt Max (r). In this section we give a strat- 
egy improvement algorithm for solving maximum reachability-time optimality equations Opt Max (r) for a 1- 
player timed region graph T. 

We define the following strategy improvement operator Improve Max : 



Improve Max (x, (T, D))(s) 



ix(s) if X (s)eM*(s,(T,D)), 
jchoose(M*(s,T)) if x (s) & M*(s,(T,D)). 



Note that Improve Max (x, (T, D))(s) may differ from the canonical (T, D)-optimal choice X(t,d)( s ) on by if x( s ) 
is itself (T, D)-optimal in state s, i.e., if x (s) G M*(s, (T, D)). 

Lemma 9 (Improvement preserves regional constancy of strategies). If % G AM ax is regionally constant, T : 
1Z — > [S — > R] is regionally simple, and D : 1Z — > [S 1 — > N] « regionally constant, then Improve Max (x, (T, D)) 
is regionally constant. 

Algorithm 1. Strategy improvement algorithm for Opt Max (r). 

1. (Initialisation) Choose a regionally constant positional strategy Xof or player Max in T; set i : = 0. 

2. (Value computation) Compute the solution (Tj, Dj) of Opt(T \xi)- 

3. (Strategy improvement) 7fImprove Max (xi, (Tj, D,{)) = Xi> then return (Tj, Dj). 
Otherwise, set Xi+i '■= im P rove Max(Xi) {Ti,D,i)); set i := i + 1; and goto step 2. 
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Proposition 10 (Fixpoints of Improve Max are solutions of Opt Max (T)). Let \ £ ^Max and let (T X ,D X ) \= 
Opt(rrx). //Improve Max ( X , (T x , £>*)) = X <&en (T x , L>*) |= Opt Max (f ). 

If F, F' : ft -> [5 -r M] then we write F < F' if for all F G ft, and for all sefi,we have F(R)(s) < 
F'(R)(s). Moreover, F < F' if F < F' and there is R G ft and s G F, such that F(R)(s) < F'(R)(s). If 
F, G, F', G' : ft -> [5 -v R] then (F, G) < lcx (F' ; G') if F < F', or if F = F' and G < G'. 

Proposition 11 (Solution of Opt(r) is the maximum solution of Opt<(r)). Let T, T< : ft — » [5 — > R] and 

D, £>< : ft -» [5 -» N] be such that (T, D) \= Opt(f) and (T<,D<) ^Opt<(f ). Then we have (T<,D<) < lcx 
(T, D), andif(T<,D<) y= Opt(P) then (T<,D<) < lcx (T, D). 

Proof. Our first goal is to establish that for every s G S, we have (T< (s), F< (s)) <i ox (T(s), D(s)). We proceed 

by induction on D(s), i.e., on the length of the X(T,_D)-P atri in T from [s] to a final region. The trivial base case 

is when [s] is a final region, because then (f (s), 5(s)) = (0, 0) and (T<(s), 5<(s)) <i cx (0, 0). Let s G S 1 \ F 

be such that Z)(s) = n + 1. Then D(Succ(s, X(t,d)( s ))) = n an d if X(t,d)( s ) = ([ S ],&,R') then we have the 
following: 

[T<(s),D^(s)) < lcx {T<(R')®(s),D<(R')®(s)) < lcx (T(i?')M, F>(i?')a 00) = (1) 

where the first inequality follows from (T<, D<) \= Opt<(r), the second inequality follows from the induction 

hypothesis, and the last equality follows from (T, D) (= Opt(T) and X(T,D)( S ) = ([ s ]> a ) -R')- This concludes the 
proof that (T<,D<) <\ cx (T,D). 

We prove that if (T<,D<) y= Opt(f) then there is s G S, such that (T<(s), D<(s)) < [cx (T(s),5(s)). 
Indeed, if (T<,D<) y= Opt(f ) then either (f<(s), D<(s)) <i cx (0, 0) for some s G F, or there is s G S \ F, for 
which the first inequality in (Q]) is strict and hence we get (T<(s), D<(s)) <\ ex (T(s), D(s)). □ 

Lemma 12 (Strict strategy improvement for Max). Let x, x' € Amex, fe* (T, D) \= Opt Min (r fx) and (T' , D') |= 

Opt Min (rrx'). and let X ' = Improve Max (x, (T, D)). Then (T, D) < lcx (T', £>') and /fx + x' then (T, D) < lcx 
(T>,D>). 

The following theorem is an immediate corollary of Lemmas [8] and [9] (the algorithm considers only regionally 
constant strategies), of Lemma[T2land finiteness of the number of regionally constant positional strategies for Max 
(the algorithm terminates), and of Proposition [lOl (the algorithm returns a solution of optimality equations). 

Theorem 13 (Correctness and termination of strategy improvement for Opt Max (T)). The strategy improvement 
algorithm for Opt Max (T) terminates infinitely many steps and returns a solution (T, D) o/Opt Max (r), such that 
T is regionally simple and D is regionally constant. 

Solving 2-player reachability-time optimality equations Opt MinI ^ ax (r). In this section we give a strategy 
improvement algorithm for solving optimality equations Opt MinMax (T) for a 2-player timed region graph T. The 
structure of the algorithm is very similar to that of Algorithm Q] The only difference is that in step 2. of every 
iteration we solve 1 -player optimality equations Opt Max (T \fi) instead of 0-player optimality equations Opt(T fx)- 
Note that we can perform step 2. of Algorithm [2] below by using AlgorithmQ] 
We define the following strategy improvement operator Improve Min : 



Improve Min (/i, (T, D))(s) 



U(s) ifM(«)€M»(«,(r,Z>)), 
\Ghoose(M,(a,(r,Z)))) if^(s) M*(s, (T, D)). 
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Lemma 14 (Improvement preserves regional constancy of strategies). If p G Amid regionally constant, T : 
TZ —* [S 1 — > R] w regionally simple, and D : TZ —* [5 — > R] « regionally constant, then Improve Min (//, (T,D)) 
is regionally constant. 

Algorithm 2. Strategy improvement algorithm for solving Opt MinMax (r). 

7. (Initialisation) Choose a regionally constant positional strategy fiQ for player Min in T; set i := 0. 

2. (Value computation) Compute the solution (Ti,Dj) ofOptMax(F\pi). 

3. (Strategy improvement) // Improve Min (/Ltj, (Tj, D,)) = p^ then return (Tj, 
Otherwise, set /Zj+i := Improve Min (/ij, (Tj, -Dj)); ref i := i + 1; arcc? goto step 2. 

Proposition 15 (Fixpoints of Improve Min are solutions of Opt MinMax (r)). Let p G AMin ond (T^,D^) \= 
Opt Max (f \p). //Improve^, (T^,Z^)) = // {Aen (T",£>") H Opt MinMax (f ). 

Proposition 16 (Solution of Opt Max (r) is the minimum solution of Opt>(r)). Let T, T> : TZ — » [S — > R] and 

D,D> : TZ -» [S -> R] fee swc/j ^ (T,D) \= Opt Max (f) and (T>,D>) (= Opt>(f). TTien (T>,D>) > lcx 
(r,£>), andif(T>,D>) \f= Opt Max (f) <Aen (T>,L>>) > lcx (T,Z>). 

Lemma 17 (Strict strategy improvement for Min). Let p, p! G A M i n , fe/ (T, D) |= Opt Max (f am/ (T", D') |= 
Opt Max (f and let p' = Improve Min (^, (T, £>)). TTien (T, D) > lox (T', £>') and if p ^ p 1 then (T, D) > lcx 
(T',D>). 

Proof First we argue that (T, D) \= Opt>(f [>') which by Proposition [16] implies that (T, D) > lcx {T',D'). 
Indeed for every s G S 1 \ F, if //(s) = ([s], a, i?) and //(s) = ([s], a', i?') then we have 

(f(.),5( a )) = (r(n)®w,JD(/i)=(.)) > lcx (t^w.^jSw), 

where the equality follows from (T,D) \= Opt Max (rf^), and the inequality follows from the definition of 
Improve Min . Moreover, if p ^ p' then there is s G SMm \ F for which the above inequality is strict. Then 
(T,D) y= Opt Max (r["/i') because every vertex R G TZwm in F\/j,' has a unique successor, and hence again by 
Proposition \TE\ we conclude that (T, D) > lox (T 1 , D'). □ 

The following theorem is an immediate corollary of Theorem [13] and Lemma [Q] of Lemma [FT] and finiteness 
of the number of regionally constant positional strategies for Min, and of Proposition [15] 

Theorem 18 (Correctness and termination of strategy improvement for Opt MinMax (T)). The strategy improvement 
algorithm for Opt MinMax (T) terminates infinitely many steps and returns a solution (T,D) of Opt MinMax (r), 
such that T is regionally simple and D is regionally constant. 

5 Complexity 

Lemma 19 (Complexity of strategy improvement). Let Tq, T\, and T2 be 0-player, 1 -player, and 2-player timed 
region graphs, respectively. A solution o/Opt^To) can be computed in time 0(\TZ\). The strategy improvement 
algorithms for Opt Max (ri) and Opt MinMax (r2) terminate in 0(\TZ\) iterations and hence run in 0(|7£| 2 ) and 
0([7£| 3 ) time, respectively. 

Since the number \1Z\ of regions is at most exponential in the size of a timed automaton [3 ], we conclude that 
the strategy improvement algorithm solves reachability-time games in exponential time. 

Corollary 20. The problem of solving reachability-time games is in EXPTIME. 
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Courcoubetis and Yannakakis proved that the reachability problem for timed automata with at least three clocks 
is PSPACE-complete lfl4l . We complement their result by showing that solving 2-player reachability games on 
timed automata with at least two clocks is EXPTIME-complete. Note that the best currently known lower bound 
for the reachability problem for timed automata with two clocks is NP-hardness l20ll . 

Theorem 21 (Complexity of reachability games on timed automata). The problem of solving reachability games 
is EXPTIME-complete on timed automata with at least two clocks. 

Theorem 22 (Complexity of reachability-time games on timed automata). The problem of solving reachability- 
time games is EXPTIME-complete on timed automata with at least two clocks. 
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Appendix 

Proofs from Section |2] 

Proof of Lemma |2](e-Optimal strategies from optimality equations). We show that for every e > 0, there ex- 
ists a positional strategy \i £ : SMm —> A x R> for player Min, such that for every strategy x f° r player Max, if 
s G S is such that D(s) < oo, then we have RT(Run(s, fJ- s ,x)) < T(s) + e - The proof, that for every e > 0, there 
exists a positional strategy Xe '■ SWx —> Ax R> for player Max, such that for every strategy /j, for player Min, if 
s G S is such that D(s) < oo then we have RT(Run(s, fj,, Xe)) > T(s) — e, is similar and omitted. The proof, that 
if D(s) = oo then player Max has a strategy to prevent ever reaching a final state, is routine and omitted as well. 
Together, these facts imply that T is equal to the value function of the reachability-time game, and the positional 
strategies fi £ and x e , defined in the proof below for all e > 0, are e-optimal. 

For e' > 0, T : S — » E, and s G SMin \ -F> we say that a timed action (a, t) G A x M> is e'-optimal for (T, D) 
in s if s s', and 



Observe that for every state s G SMm and for every e' > 0, there is a e'-optimal timed action for (T, D) in s 
because (T,D) \= Opt MinMax (r). Moreover, again by (T,D) \= Opt MinMax (T) we have that for every s G 
SWx \ F an d timed action (a, t), such that s —*t s '> we have 



Let e > 0; we define fi £ : SMm —* A x M> by setting fj, £ (s), for every s G S^in, to be a timed action which is 
e'(s)-optimal for (T, D) in s, where e'{s) > is sufficiently small (to be determined later). Let x be an arbitrary 
strategy for player Max and let r = Run(s, /j £ , x) = (so, (ai>*i), s i) (02,^2), • • •}■ Let N = Stop(r). Our goal is 
to prove that RT(r) < T(s) + e, i.e., that T(s) > J2k=i ** ~ £ - 

For every state s £ S, such that D{s) < 00, define e'(s) = £ ■ 2~ D( - S \ Note that if we add left- and right- 
hand sides of the inequalities ([3]> or §5$, respectively, for all states Sj, and e'(sj)-optimal timed actions fi £ (si) if 
Si G SWim where i = 0, 1, . . . , N — 1, then we get 



D(s') < D{s) - 1, and 
t + T(s') < T(s) + e'. 



(2) 
(3) 



D(s') < D(s) - 1, and 
t + T(s') < T(s). 



(4) 
(5) 



N N-l N-l 



T(s) = T(s ) > ^t fc -^e'(s fc ) > J2 tk ~ £ - 



k=l k=0 k=0 
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The first inequality holds by T(sn) = T(s stop ^) = 0, and the second inequality holds because 



N-l N-l oo 

5>'( Sfc ) = 5>.2-"<*>) < e-^a-" < e, 

k=0 k=0 d=l 

where the first inequality follows by (O and (01). 

It may be worth noting that if the finite values of the function D are bounded, i.e., if B < oo, where B = 
sup se5 {-D(s) : D(s) < oo}, then in the above proof it is sufficient to define e'(s) = e/B, for all s G S, which 
gives arguably more realistically "physically implementable" e-optimal strategies. □ 

Proof of Lemma |3]. We prove the lemma for functions min(F, F') and max(F, F') instead of min(F, F') and 
max(F, F'), respectively. Extending the result to the unique continuous extensions to X is routine. The case 
when both F and F' are constant functions is straightforward. Hence it suffices to consider the following two 
cases. 

Case 1. Let F(s) = e — s(c) and let F'(s) = d, for some e, d G Z and a clock c G C. Note that for every 
state s G R, we have [F'(s) — F(s)\ = {el — e) + [s(c)\ and hence [F' — F\ is a constant function in region R. 
Therefore either F'(s) - F(s) > for all s G R, or F'(s) - F(s) < for all s G F, i.e., either min(F, F') = F 
and max(F, F') = F' , or min(F, F') = F' and max(F, F') = F. 

Case 2. Let F(s) = e — s(c) and F'(s) = e' — s(c'), for some e, e' G Z and clocks c, d G C. Note that for 
every state s G R, we have [^'(s) — ^ ? ( S )J = ( e ' — e ) + L s ( c ') — S ( C )J an< i 



L,(c') - fl (c)J 




ifM^W<Mc)J- 



In particular, as in the previous case we have that [F' — FJ is a constant function in region R and hence one of 
the functions F or F' is equal to max(F, F') and the other is equal to min(F, F 1 ). □ 

Proof of PropositionlU Let a = (a, b, c). If F is a constant function, i.e., if there is some e G Z, such that for all 

s' G R', we have F(s') = e,thenF®(s) = t(s,a)+e. Ifs(c) > 6foralls G i?, then t(s, a) = Oforalls G R, and 
hence F®(s) = e and F® is simple. If instead s(c) < 6 for all s G ii, then F®(s) = (6 — s(c)) + e = (6+e) — s(c) 
and hence it is a simple function. 

The other case is when F is not a constant function, i.e., if there are a constant e G Z and a clock d G C, such 
that for all s' G F', we have F(s') = e — s'(c'). We consider two subcases. 

If d G p(a) then F®(s) = t(s, a) + (e — s'(c')) = £(s, a) + e, because by the assumption that d G /9(a) we 
have that s'(c') = 0. If s(c) > & for all s G F, then i(s, a) = for all s G F, and hence F® (s) = e which is 
a simple function. If instead s(c) < 6 for all s G F, then F®(s) = (6 + e) — s(c) which is also a simple function. 

If instead d p(a) then F® (s) = t(s, a) + (e — (s(c') + t(s, a))) = e — s(c'), because by the assumption that 
d p(a) we have that s'(d) = s(d) + t(s, a), and hence F® is a simple function. □ 

Proof of Proposition |H We consider two cases. If F is a constant function, i.e., if there is e G Z, such that 
for all s' G R' we have F(s') = e, then F® a (t) = t + F(Succ(s, (a,t))) = t + e, which is a continuous and 
nondecreasing function of t. 

The other case is when F is not a constant function, i.e., if there are a constant e G Z and a clock d G C, such 
that for all s' G R', we have F(s') = e — s'(c'). We consider two subcases. If d G p(a) then F® a (t) = f + e which 
is continuous and nondecreasing. If instead d p(a) then F® a (i) = i + (e — (s + £)(</)) = £ + e — (s(c') + 1) = 
e — s(c'), i.e., F® a is a constant function and hence continuous and nondecreasing. □ 
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Proofs from Section |3] 

Proof of Theorem [6] (Correctness of reduction to timed region graphs). Now we prove the equality (b). 

D(s) = min \D(R')® (s) : f(s) = T(R')®(a) and m = ([a], a, R')\ 

ra&M 

= min{l + d' : f (a) = T(R')®(s) and ([a], a, R') E M and D(R') = d'} 
= minjl + d! : f(s) = inttt + f(s') : s \ s' and D(s') = d'}\ 

d'GN L a,t ' 

The first equality holds by the assumption that (T, D) \= Opt MinMax (T). The second equality holds because of 
the assumption that D is regionally constant, and we write D(R') = d', where d! E N, to express that for all 
s E R', we have D(R')(s) = d' . Finally, to establish the third equality it is sufficient to perform a calculation 
analogous to the above proof of (a), in order to show that 

f (a) = T{R')®(s) and ([a], a, R') E M and D(R') = d' 

if and only if 

f(s) = M{t + f(s') : s\s' and D(s') = d'}. 

a,t 

□ 



Proofs from Section |4] 

Proof of Lemma |8] (Solution of Opt(T) is regionally simple). In a 0-player timed region graph F, for every re- 
gion R, there is at most one outgoing labelled edge (R, a, R') E M, and hence for every region R, there is 
a unique Al-path from R in T. For every region R E 1Z, we define the distance d(R) E N to be the smallest 
number of edges in the unique .M-pafh from R, that one needs to reach a final region. It is easy to show that for 
every state s E 5, we have that D([s])(s) = d([s]), and hence D is regionally constant. 

We prove that for every region R E TZ, the function T(R) : R — > M is simple, by induction on d(R). If 
d(R) = then T(R)(s) = for all s eR, and hence T(R) is simple on R. 

Let d(R) = n+1 and let (R, a, R') E M be the unique edge going out of R in T. Observe that T(R) = T(i?')® 
because for every s E R, we have T(R)(s) = T([s])(s) = T(R')®(s), where the second equality follows from 
(T, D) \= Opt(r). Moreover, by the induction hypothesis the function T(R') : R' R is simple, and hence by 
Proposition g] we get that T(R')® = T(R) is simple. 

If d{R) = oo, i.e., if the unique .M-pafh from R in T never reaches a final region, then we set T(R')(s) = oo, 
for all s E R. Therefore T(R') : R — > R is a constant function and hence it is simple. □ 

Proof of Lemma |9] (Improvement preserves regional constancy of strategies). We need to prove that for s, s' E 

S, if [s] = [s'] then x'( s ) = x'( s ')> where x' = im P rove Max(X) (T, D)). By regionality of x it is sufficient to 
prove that M*(s, (T,D)) = M*(s', (T,D)). By regional simplicity of T, and by Proposition 01 we have that 
functions T(R)® : [s] — > R, for all m = ([s],a, R) E M., are simple. Then we have 

M*(s,(T,D)) = axgma^ e *{(T(R)®(a),D(R)®(a)) : m = ([a],a,R)} 

= zrgnW™ {(T(R)®(s>),D(R)®(s')) : m = ([s'],a, R)} 

= M*(s',(T,D)), 

where the second equality follows from [s] = [a'], regional constancy of D, and by Lemma[3]applied to the (finite) 
set of functions {T(R)® : ([a], a, R) E Ai}. □ 
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Proof of Lemma [12] (Strict strategy improvement for Max). First we argue that (T, D) \= Opt< (T \x') which 
by Proposition flTl implies that (T, D) < (T',D'). Indeed for every s G S \ F, if x(s) = ([s],a,i?) and 
x'(s) = ([s], a',R') then we have 

(T(s),D(s)) = (T(R)®(s),D(R)®(s)) < lcx (T(R>)®(s),D(R>)®(s)), 

where the equality follows from (T,D) \= Opt Min (rfx)» and the inequality follows from the definition of 
Improve Max . Moreover, if x 7^ x' men there is s G 5"Max \ F for which the above inequality is strict. Then 
(T, D) y= Opt Min (r \x') because every vertex in T \x' has a unique successor, and hence again by Proposition [TT] 
we conclude that (T, D) < lox (T', D'). □ 



Proofs from Section |5] 

Proof of Lemma H9l (Complexity of strategy improvement). An 0(|7£|) algorithm for solving Opt(ro) is im- 
plicit in the proof of Lemma [8] 

Let (T, D) \= Opt Max (ri); and for all i > 0, let Xi G Ajyiax be the strategy in the z-th iteration of AlgorithmQ} 
and let (Ti,Di) \= Opt(rifxt)- We claim that for every i > 0, if D(R) = i then for all j > i, we have 
(Tj(R), Dj(R)) = (T(R), D(R)). This can be established by a routine induction on the values of the regionally 
constant function D. Observe that the finite values of the function D are bounded by \1Z\, because in the proof 
of Lemma [8] they are set to be the length of a simple path in a timed region graph. Algorithm Q] must therefore 
terminate no later than after \1Z\ + 1 iterations, because for every i > 0, in the z-th iteration there must be R G 1Z 
whose value D(R) is set to i. 

An analogous routine proof by induction on the value of D can be used to prove that Algorithm [2] terminates in 
0(\K\) iterations. □ 

Proof of Theorem |2l] (Complexity of reachability games on timed automata). In order to solve a reachability 
game on a timed automaton it is sufficient to solve the reachability game on the finite region graph of the automa- 
ton. Observe that every region, and hence also every configuration of the game, can be written down in polynomial 
space, and that every move of the game can be simulated in polynomial time. Therefore, the winner in the game 
can be determined by a straightforward alternating PSPACE algorithm, and hence the problem is in EXPTIME 
because APSPACE = EXPTIME. 

In order to prove EXPTIME-hardness of solving reachability games on timed automata with two clocks, we 
reduce the EXPTIME-complete problem of solving countdown games [18] to it. Let G = (N, M, it, uq, Bq) be 
a countdown game, where N is a finite set of nodes, M C N x N is a set of moves, it : M — > N>o assigns 
a positive integer number to every move, and (uq, Bq) G N x N>o is the initial configuration. In every move of 
the game from a configuration (n, BjcJVx N>o, first player 1 chooses a number p G N>o, such that p < B and 
7r(n, n') = p for some move (n, n') G M, and then player 2 chooses a move (n, n") G M, such that 7r(n, n") = p; 
the new configuration is then (n", B —p). Player 1 wins a play of the game when a configuration (n, 0) is reached, 
and he loses (i.e., player 2 wins) when a configuration (n, B) is reached in which player 1 is stuck, i.e., for all 
moves (n, n') G M, we have 7r(n, n') > B. 

We define the timed automaton T G = (L, C, S, A, E, 5, p, F) by setting C = { b,c}; S = L x (|5 ]]k) 2 ; 
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A = PL) M, where P = n(M), the image of the function ir : M ->■ N >0 ; 

L = {*}UJVu{(ii,p) : there is (n, n') G M, s.t. 7r(n, n') = p} ; 



E(a) 



5(£, a) 



{(n, v) : n G N and u{b) = So} if a = *, 

{(n, i^) : there is (n, n') <G M, s.t. 7r(n, n') = p and z^(c) = 0} if a = p G P, 
{((n,p),z^) : 7r(n, n') = p and z^(c) = p} if a = (n, n') G M, 

* if I = n G N and a = *, 

(n, p) if ^ = n G iV and a = p£P, 



n 



if ^ = (n, p) G AT x P and a = (n, n') G M; 



p(a) = { c }, for every a G A; and F = { * } x V. Note that the timed automaton Tq has only two clocks and 
that the clock b is never reset. 

Finally, we define the reachability game Tq = (Tq, Li,L 2 ) by setting L\ = N and L 2 = L \ L\. It is routine 
to verify that player 1 has a winning strategy from state (no, (0,0)) G S in the reachability game Tq if and only if 
player 1 has a winning strategy (from the initial configuration (no, B )) in the countdown game G. □ 
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